-
Notifications
You must be signed in to change notification settings - Fork 63
Fix data race when migrating slots #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #324 +/- ##
============================================
+ Coverage 43.38% 47.08% +3.69%
============================================
Files 37 45 +8
Lines 2971 4420 +1449
============================================
+ Hits 1289 2081 +792
- Misses 1544 2130 +586
- Partials 138 209 +71
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR aims to resolve migration inconsistencies by resetting shard migration state per-shard, ensuring slot-only migrations clear previous migration info, and serializing the MigrateSlot API to avoid concurrent conflicts.
- Clear migration state for source shard on slot-only migrations
- Serialize MigrateSlot API calls with a handler-level mutex and manual cluster lookup
- Adjust routing middleware and change loop error handling to continue on per-shard failures
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| store/cluster.go | Clear migration state before moving slots when slotOnly is true |
| server/route.go | Removed RequiredCluster middleware from the migrate endpoint |
| server/api/cluster.go | Added sync.Mutex, switched to manual cluster retrieval, and serialized MigrateSlot calls |
| controller/cluster.go | Replaced early return with continue in shard status update loop |
Comments suppressed due to low confidence (4)
server/route.go:72
- The migrate route no longer includes RequiredNamespace and RequiredCluster middleware, which may bypass essential validation and context injection. Consider re-adding the appropriate middleware to enforce namespace and cluster context.
clusters.POST("/:cluster/migrate", handler.Cluster.MigrateSlot)
server/api/cluster.go:50
- [nitpick] The field name
muis generic; consider renaming it to something more descriptive likemigrateMuorclusterMuto clarify its purpose.
mu sync.Mutex
store/cluster.go:210
- [nitpick] It may be helpful to add a comment explaining why
ClearMigrateStateis only called for the source shard whenslotOnlyis true, to improve maintainability and clarity of intent.
cluster.Shards[sourceShardIdx].ClearMigrateState()
server/api/cluster.go:123
- Using a single mutex on the handler serializes all MigrateSlot calls across all clusters, potentially creating a bottleneck. Consider using a more granular lock mechanism (e.g., per-cluster) to allow concurrent migrations on different clusters.
handler.mu.Lock()
Uh oh!
There was an error while loading. Please reload this page.